Project 4: Emotions of JMU¶

Understanding the Place where we Study¶

[Names]¶

📚 Table of Contents¶

📋 Getting Started¶

  • Instructions
  • Part 1: Research Overview
    • 1.1 Corpus Description
    • 1.2 Hypothesis

🔍 Data Exploration¶

  • Part 2: Exploratory Data Analysis
    • 2.1 Visualization 1
    • 2.2 Visualization 2

🧹 Data Processing¶

  • Part 3: Data Cleaning and Refinement
    • 3.1 Toponym Misalignment Analysis
    • 3.2 Toponym Refinement
    • 3.3 Revised Map
    • 3.4 Map Customization

🗺️ Spatial Analysis¶

  • Part 4: Spatial Comparison
    • 4.1 JMU Spatial Distribution
    • 4.2 Spatial Analysis

😊 Sentiment Analysis¶

  • Part 5: Sentiment Analysis Comparison

⏰ Time Series Analysis¶

  • Part 6: Time Series Animation Analysis

📝 Final Report¶

  • Part 7: Conclusion and Future Research

📋 Assignment Overview¶

We have spent the last several weeks trying to reproduce the work of Ryan Heuser in "The Emotions of London". Rather than London, we have scraped JMU's Reddit feed to establish the emotions of JMU by first geoparsing the data for locations, cleaning those locations, and then attaching emotions to them. Our ultimate goal was to get a better sense of how JMU students think of the place where they live and study. In particular, we are interested in what the world looks like from the perspective of a JMU students who post to Reddit. Using our data we can start to answer such questions as:

  • What places are often talked about?
  • How are they talked about?
  • What might this tell us about how JMU works as an imagined community?

In this visual essay, you are going create a fuller picture of the world as perceived through JMU students by contrasting it with another school in Virginia. Undoubtedly, dorms, food, parking are the concerns of every college students, but in looking at contrasting data sets we can try to figure out what makes the JMU community unique compared to other schools.

📝 Instructions¶

The following is a self-contained visual essay. Your task is to complete it as a group.

Each section has its own set of instructions and tasks that fall into the following categories:

  • 📊 Action Items = Data exploration and technical tasks
  • ✍️ Writing Tasks = Analysis and interpretation assignments
  • 🔧 Technical Steps = Code implementation instructions
  • 💡 Examples = Sample content and guidance
  • ⚠️ Important = Critical requirements and warnings

Each part of the essay draws on a previous skill we have learned, and while you can give ownership of each part to a different team member, it is best to walk through the entire essay with your team twice. First, at the beginning to get a sense of the data and your potential hypothesis, and second, as a final run through to make sure all the parts are coherent. Once you have completed all the sections, delete all of the instructions and leave only your writing, the code, and the visualizations.

⏰ Timeline¶

Final Essay is Due Tuesday Nov 11th

  1. Nov. 3rd - Assignment Introduction - Complete Parts 1-2
    • Homework - Complete Parts 3-4
  2. Nov. 5th - In class work day - Convene about 3-4, move on to 5 and 6
  3. Nov. 10 - Discuss Krygier and Wood Reading - Final Crunch Last 30 Minutes
  4. Nov. 12 - Final Project Introduction: Deep Mapping Harrisonburg

Part 1: Research Overview¶

Introduction¶

Your introduction should include:

  • Brief overview of the two corpora you are studying (i.e. JMU and UNC)
  • Your hypothesis about these corpora
  • Your findings based on
    • Exploratory Data Analysis (Voyant)
    • Your major finding using spatial analysis
    • Your major finding using sentiment maps
  • Overall conclusion

Each element should be about a sentence or two The introduction should be written last

1.1 Corpus Description¶

📊 Action Items¶

📋 Data Exploration

  • Open and inspect: assets\data\jmu_reddit_geoparsed_clean.csv
  • Your own institution data set: group_data_packets\group_NUMBER\python\INSTITUTION_processed.csv
    • For example: group_data_packets\group_6\python\UNC_processed.csv

✍️ Writing Task¶

Skim through the records and try to get a sense of what the data is about. Come up with a working theory as to what might be an important point of comparison between your institution and JMU.

Write one paragraph in which you compare or contrast each corpus.

1.2 Hypothesis¶

✍️ Writing Task¶

Formulate a working theory of how the two corpora differ or compare and explain how you might see that reflected in the data.

⚠️ Important Requirements:

  • The hypothesis must be related to space and sentiment
  • Example of what NOT to do: "JMU students love math classes more than UNC students" (not spatial and cannot be answered by our data)

💡 Sample Hypotheses¶

Regional Scale Examples:

  1. UNC attracts more students from the southeast than JMU and therefore more positive sentiments will appear across South Carolina, Georgia, Alabama, and Florida.

Local Scale Examples:

  1. JMU has a very positive college atmosphere, and therefore sentiment around campus will appear more positive than that of UNC.

Note that these hypotheses operate at two different spatial scales: regional and local.

Part 2: Exploratory Data Analysis¶

📊 Action Items¶

📋 Data Preparation

  • Use the zip file in your group directory (group_data_packets/voyant/INSTITUTION_JMU.zip) to run Voyant analysis at https://www.voyant-tools.org

✍️ Writing Task¶

Write a brief overview of the type of evidence you are hoping to find with your Voyant analysis.

💡 Example: "We will compare the difference in 'school spirit' between UNC and JMU by looking at terms related to sports, social events, and mascots in the trends tool and the context tool."

2.1 Visualization 1 [Change Title Depending on Visualization]¶

✍️ Writing Task¶

Explain what this visualization does and how it might help your research.

📊 Add Visualization 1 Here¶

🔧 Technical Steps:

  1. Create the visualization you want in Voyant Tools
  2. Click on the export button at the top right of the visualization
  3. Open Export View (Tools and Data) dropdown
  4. Select an HTML snippet for embedding this view in another web page radio button
  5. Copy the snippet that starts with <iframe> and ends with the closing tag </iframe>
  6. Replace the whole string in the markdown cell below

📺 Video Tutorial: https://youtu.be/FFHaNLVLmz4?si=MJxohF7vd2PRx9Ox

✍️ Visualization Analysis¶

Write how the visualization confirms or complicates your hypothesis.

2.2 Visualization 2 [Change Title Depending on Visualization]¶

✍️ Writing Task¶

Explain what this visualization does and how it might help your research.

Visualization Analysis¶

✍️ Write how the visualization confirms or complicates your hypothesis.

Part 3: Data Cleaning and Refinement¶

📋 About _processed.csv Files¶

Each group has been provided with a _processed.csv file. This file includes all of the necessary data to create sentiment maps. The issue is that this is still spatial data in its raw form and contains erroneous data. In this section, you will take your _processed.csv file and clean it up.

For the sake of example, a UNC dataset has been preloaded to show what the result should look like.

⚠️ Note: Delete this cell before turning in the final version.

In [14]:
# =============================================================================
# SETUP: Import Libraries and Load Data
# =============================================================================
# This cell sets up all the tools we need for spatial sentiment analysis

# Force reload to pick up any changes to data_cleaning_utils
# This ensures we get the latest version of our custom functions
import importlib
import sys
if 'data_cleaning_utils' in sys.modules:
    importlib.reload(sys.modules['data_cleaning_utils'])

# Core data analysis library - like Excel but for Python
import pandas as pd

# Import our custom functions for cleaning and analyzing location data
from data_cleaning_utils import (
    clean_institution_dataframe,      # Standardizes and cleans location data
    get_data_type_summary,            # Shows what types of data we have
    get_null_value_summary,           # Identifies missing data
    create_location_counts,           # Counts how often places are mentioned
    create_location_sentiment,        # Calculates average emotions by location
    create_time_animation_data,       # Prepares data for animated time series
)

# Interactive plotting library - creates maps and charts
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import plotly.offline as pyo

# =============================================================================
# CONFIGURE PLOTLY FOR HTML EXPORT
# =============================================================================
# Configure Plotly for optimal HTML export compatibility

# Method 1: Set renderer for HTML export (use 'notebook' for Jupyter environments)
pio.renderers.default = "notebook"

# Method 2: Configure Plotly for offline use (embeds JavaScript in HTML)
pyo.init_notebook_mode(connected=False)  # False = fully offline, no external dependencies

# Method 3: Set template for clean HTML appearance
pio.templates.default = "plotly_white"

# Method 4: Configure Plotly to include plotly.js in HTML exports
import plotly
plotly.offline.init_notebook_mode(connected=False)



# Load the cleaned JMU Reddit data (already processed and ready to use)
# This contains: posts, locations, coordinates, sentiment scores, and dates
df_jmu = pd.read_pickle("assets/data/jmu_reddit_geoparsed_clean.pickle")
In [15]:
# =============================================================================
# LOAD YOUR INSTITUTION'S DATA
# =============================================================================
# Replace the group number and institution name with your assigned data

# 📝 TO DO: Update these paths for your group
# Replace "group_6" with your group number (e.g., "group_1", "group_2", etc.)
# Replace "UNC_processed.csv" with your institution's file name
df_institution = pd.read_csv("group_data_packets/group_6/python/UNC_processed.csv")
In [16]:
# =============================================================================
# CREATE RAW LOCATION MAP (Before Cleaning)
# =============================================================================
# This shows the "messy" data before we fix location errors
# You'll see why data cleaning is essential!

# STEP 1: Count how many times each place is mentioned
# Group identical place names together and count occurrences
place_counts = df_institution.groupby('place').agg({
    'place': 'count',           # Count how many times each place appears
    'latitude': 'first',        # Take the first latitude coordinate for each place
    'longitude': 'first',       # Take the first longitude coordinate for each place
    'place_type': 'first'       # Take the first place type classification
}).rename(columns={'place': 'count'})  # Rename the count column for clarity

# STEP 2: Prepare data for mapping
# Reset index makes 'place' a regular column instead of an index
place_counts = place_counts.reset_index()

# Remove any places that don't have valid coordinates (latitude/longitude)
# This prevents errors when trying to plot points on the map
place_counts = place_counts.dropna(subset=['latitude', 'longitude'])

# STEP 3: Create interactive scatter map
# Each dot represents a place, size = how often it's mentioned
fig = px.scatter_map(
    place_counts,                    # Our prepared data
    lat='latitude',                  # Y-coordinate (north-south position)
    lon='longitude',                 # X-coordinate (east-west position)
    size='count',                    # Bigger dots = more mentions
    hover_name='place',              # Show place name when hovering
    hover_data={                     # Additional info in hover tooltip
        'count': True,               # Show mention count
        'place_type': True,          # Show what type of place it is
        'latitude': ':.4f',          # Show coordinates with 4 decimal places
        'longitude': ':.4f'
    },
    size_max=25,                     # Maximum dot size on map
    zoom=4,                          # How zoomed in the map starts (higher = closer)
    title='Raw Location Data: Places Mentioned in UNC Reddit Posts',
    center=dict(lat=35.5, lon=-80)   # Center map on North Carolina for UNC
)

# STEP 4: Customize map appearance
fig.update_layout(
    map_style="carto-positron",      # Clean, light map style
    width=800,                       # Map width in pixels
    height=600,                      # Map height in pixels
    title_font_size=16,              # Title text size
    title_x=0.5                      # Center the title
)


# Configure for HTML export compatibility
fig.show(config={'displayModeBar': True, 'displaylogo': False})

3.1 Toponym Misalignment Analysis¶

📊 Action Items¶

Study the map of your institution alongside the .csv file. Try to determine some of the major mistakes the geoparser has made.

💡 Example: In the map above, Carolina is placed on the North Carolina and South Carolina border. In all likelihood, Reddit posters mean North Carolina when they post Carolina.

✍️ Writing Task¶

Write a description of the major toponym misalignments in your dataset.

💡 Sample Description: "The University of North Carolina's home campus is in Chapel Hill, NC and most locations should appear around those coordinates. There are some major erroneous locations, however. In the main, Chapel Hill has been placed in Tennessee, and likewise there is a Hill Chapel in Virginia, but this likely refers to Chapel Hill in NC as well."

3.2 Toponym Refinement¶

In this section you are going to clean up the map above by fixing some of the major locations and getting a sense of the data.

📺 Complete Video Tutorial: https://www.youtube.com/watch?v=TZgRYn1TxGI

🔧 Step 1: Access Your Data¶

  1. Navigate to your group in group_data_packets
  2. Go to the python folder
  3. Open the CSV file that ends with _processed.csv (e.g., GMU_processed.csv)

📋 Step 2: Data Structure¶

Your CSV file should contain the following 14 columns:

  • school_name - Institution identifier
  • unique_id - Record identifier
  • date - Post date
  • sentences - Text content
  • roberta_compound - Sentiment score
  • place - Original location name
  • latitude - Original coordinates
  • longitude - Original coordinates
  • revised_place (empty) - Corrected location name
  • revised_latitude (empty) - Corrected latitude
  • revised_longitude (empty) - Corrected longitude
  • place_type (empty) - Location category
  • false_positive (empty) - Invalid location flag
  • checked_by (empty) - Quality control tracker

🔧 Step 3: Data Cleaning Process¶

📊 Open in Google Sheets: Upload your CSV file to Google Sheets for collaborative editing

🔧 Fix Locations: Enter corrected data in the revised_ columns for mistaken locations

🏷️ Categorize Places: Use the place_type column with these standardized categories:

  • Country - National boundaries
  • State - State/province level
  • County - County/region level
  • City - Municipal areas
  • Neighborhood - Local districts
  • University - Educational institutions
  • Building - Specific structures
  • Road - Streets and highways

💡 Pro Tip: Create a data validation dropdown in Google Sheets for consistent place types

❌ Mark False Positives: Set false_positive to True for sentences that don't actually reference locations

👥 Track Progress: Enter your name in checked_by to coordinate team efforts

💾 Step 4: Export Clean Data¶

  1. Go to File → Download → Comma Separated Values (.csv)
  2. Save as: [institution]_processed_clean.csv in the same folder
    • Example: group_2/python/ODU_processed_clean.csv

⚠️ Note: Delete this cell before turning in the final version.

3.3 Revised Map¶

✍️ Writing Task¶

Explain the major fixes you made to the map and what you learned about the dataset in the process. You can answer any of the following questions:

  • What were some places of note on the campus you investigated?
  • What appeared to be the main concern many of the posts?
  • What were some places that surprised you?

🔧 Technical Implementation¶

Code Changes Required:

  1. Run the python below to display map. Remember, this will only work if the .csv file was saved properly.
  2. You will have to change the code to properly set it to your group and file.
df_institution_cleaned = pd.read_csv('group_data_packets/group_6/python/UNC_processed_clean.csv')

⚠️ Remember: You are changing the group number and the institution name.

In [17]:
# =============================================================================
# LOAD CLEANED DATA
# =============================================================================
# Load the CSV file you manually cleaned in Google Sheets

# 📝 TO DO: Update these paths for your group
# Replace "group_6" with your group number
# Replace "UNC_processed_clean.csv" with your institution's cleaned file
df_institution_cleaned = pd.read_csv(
    "group_data_packets/group_6/python/UNC_processed_clean.csv"
)
In [18]:
# =============================================================================
# APPLY DATA CLEANING FUNCTIONS
# =============================================================================
# Use our custom function to standardize the cleaned data

# Apply the cleaning function to standardize data types and handle missing values
# This function ensures all datasets have the same format for consistent analysis

df_institution_cleaned = clean_institution_dataframe(df_institution_cleaned)

# Display first few rows to verify the cleaning worked properly
# This shows the structure and sample content of your cleaned data
df_institution_cleaned.head()
DataFrame cleaned successfully!
Out[18]:
school_name unique_id date sentences roberta_compound place latitude longitude revised_place revised_latitude revised_longitude place_type false_positive checked_by
0 UNC UNC_10299 2021-04-17 18:15:17 Missing all of South Campus & all of the apart... -0.674987 501 38.68287 16.10256 501 38.68287 16.10256 Unknown <NA> Not Checked
1 UNC UNC_9515 2024-08-27 20:37:43 Not sure if it was a student or resident but e... -0.601315 Abernathy 33.83230 -101.84295 Abernathy 33.83230 -101.84295 Unknown <NA> Not Checked
2 UNC UNC_1986 2024-04-29 00:02:28 Israel is a key figure for the United States i... -0.026644 Al Bank al Brīţānī lish Sharq al Awsaţ 25.26430 55.29326 Al Bank al Brīţānī lish Sharq al Awsaţ 25.26430 55.29326 Unknown <NA> Not Checked
3 UNC UNC_1987 2024-04-29 00:02:28 Iranian Parliament members came together and c... -0.387580 Al Bank al Brīţānī lish Sharq al Awsaţ 25.26430 55.29326 Al Bank al Brīţānī lish Sharq al Awsaţ 25.26430 55.29326 Unknown <NA> Not Checked
4 UNC UNC_3411 2020-08-27 06:13:30 Alabama pulled ahead with 531 in a week. 0.051380 Alabama 32.75041 -86.75026 Alabama 32.75041 -86.75026 State <NA> Not Checked

3.4 Map Customization¶

🔧 Map Adjustment Instructions¶

  • The revised map will display below.
  • Tweak the visuals on the map by setting the zoom and the center.
    • Find the snippet below in the px.scatter_map function
    • Change the zoom, lat, and long to set your view.
zoom=2, # Set the zoom level here
center=dict(lat=38.5, lon=-106),  # Set the center of your view here
In [19]:
# =============================================================================
# CREATE CLEANED LOCATION MAP (After Manual Corrections)
# =============================================================================
# This map shows your data AFTER you fixed the location errors
# Compare this to the raw map above to see the improvement!

# STEP 1: Count occurrences using CLEANED/CORRECTED location data
# Now we use 'revised_place' instead of 'place' - these are your corrections!
place_counts = (
    df_institution_cleaned.groupby("revised_place")  # Group by corrected place names
    .agg(
        {
            "revised_place": "count",        # Count mentions of each corrected place
            "revised_latitude": "first",     # Use corrected latitude coordinates
            "revised_longitude": "first",    # Use corrected longitude coordinates
            "place_type": "first",           # Keep place type classification
        }
    )
    .rename(columns={"revised_place": "count"})  # Rename count column for clarity
)

# STEP 2: Prepare data for mapping
place_counts = place_counts.reset_index()  # Make 'revised_place' a regular column

# Remove places without valid corrected coordinates
place_counts = place_counts.dropna(subset=["revised_latitude", "revised_longitude"])

# STEP 3: Create the cleaned location map
fig = px.scatter_map(
    place_counts,
    lat="revised_latitude",          # Use corrected Y-coordinates
    lon="revised_longitude",         # Use corrected X-coordinates
    size="count",                    # Dot size = mention frequency
    hover_name="revised_place",      # Show corrected place name on hover
    hover_data={
        "count": True,               # Show how many mentions
        "place_type": True,          # Show place category
        "revised_latitude": ":.4f",   # Show corrected coordinates
        "revised_longitude": ":.4f",
    },
    size_max=25,                     # Maximum dot size
    title="Cleaned Location Data: Places Mentioned in UNC Reddit Posts",
    zoom=2,                          # 📝 TO DO: Adjust zoom level for your region
    center=dict(lat=38.5, lon=-106), # 📝 TO DO: Adjust center coordinates
)

# STEP 4: Customize map appearance
fig.update_layout(
    map_style="carto-positron",      # Clean, readable map style
    width=800, 
    height=600, 
    title_font_size=16, 
    title_x=0.5
)


# Display with HTML export configuration
fig.show(config={'displayModeBar': True, 'displaylogo': False})

Revision Insights¶

✍️ Write any new spatial insights you arrived at now that you have fixed the data.

Part 4: Spatial Comparison¶

In this section you are going to compare the spatial distribution of JMU and your institution. You will create two maps using the custom create_locations_counts() function.

🔧 Function Parameters¶

This function has two key parameters:

  • minimum_count= - Sets the minimum number of times a location has to appear before it appears on the map. Setting this to the default 2 means that if a location appears once it will not register
  • place_type_filter= - This uses the place_type column to filter out only the types of places you want. The default is None, but adding a list of places i.e. (place_type_filter=["University", "Building"]) only shows those places. In the sample below, only "States", "City", and "Country" places will show up.

⚠️ NOTE: This does mean you would have had to tag place types properly in the cleanup process

💡 Example Usage¶

create_location_counts(
    df_institution_cleaned, 
    minimum_count=2, 
    place_type_filter=["State", "City", "Country"]
)

🔧 Customization Instructions¶

⚠️ Important: Make sure that minimum_count and place_type_filter for create_locations_counts are the same for both maps.

Visual Customization:

  • Tweak the center and zoom of your map to highlight an important contrast
  • Set the color_discrete_sequence=px.colors.qualitative.Plotly to something of your choice
    • Color Reference: https://plotly.com/python/discrete-color/#color-sequences-in-plotly-express

⚠️ Note: Delete this cell for the final version

4.1 JMU Spatial Distribution¶

In [20]:
# =============================================================================
# JMU SPATIAL DISTRIBUTION MAP
# =============================================================================
# Create a filtered map showing only certain types of places for JMU

# STEP 1: Use custom function to filter and count JMU locations
# This function applies the same filtering to both datasets for fair comparison
JMU_filtered_locations = create_location_counts(
    df_jmu,                          # JMU Reddit data
    minimum_count=2,                 # Only show places mentioned 2+ times
    place_type_filter=['State', 'City', 'Country']  # Only these place types
)


# STEP 2: Create colored scatter map
# Each place type gets a different color to show spatial patterns
fig = px.scatter_map(
    JMU_filtered_locations,
    lat="revised_latitude",
    lon="revised_longitude", 
    size="count",                    # Dot size = mention frequency
    color="place_type",              # Different colors for different place types
    hover_name="revised_place",
    hover_data={
        "count": True,
        "place_type": True,
        "revised_latitude": ":.4f",
        "revised_longitude": ":.4f",
    },
    size_max=25,
    zoom=2,                          # 📝 TO DO: Adjust to highlight interesting patterns
    title="Cleaned Location Data: Places Mentioned in JMU Reddit Posts",
    center=dict(lat=38.5, lon=-106), # 📝 TO DO: Center on area of interest
    color_discrete_sequence=px.colors.qualitative.Plotly  # Categorical color palette
)

# STEP 3: Customize layout
fig.update_layout(
    map_style="carto-positron", 
    width=800, 
    height=600, 
    title_font_size=16, 
    title_x=0.5
)


# Display with HTML export configuration
fig.show(config={'displayModeBar': True, 'displaylogo': False})
In [21]:
# =============================================================================
# YOUR INSTITUTION'S SPATIAL DISTRIBUTION MAP
# =============================================================================
# Create a comparable map for your institution using identical filtering

# STEP 1: Apply the same filtering to your institution's data
# Using identical parameters ensures fair comparison with JMU
institution_filtered_locations = create_location_counts(
    df_institution_cleaned,          # Your cleaned institution data
    minimum_count=2,                 # Same minimum as JMU map
    place_type_filter=["State", "City", "Country"]  # Same place types as JMU
)


# STEP 2: Create matching visualization
# Keep all settings the same as JMU map for direct comparison
fig_institution_cleaned = px.scatter_map(
    institution_filtered_locations,
    lat="revised_latitude",
    lon="revised_longitude",
    size="count",
    color="place_type",              # Same color coding as JMU map
    hover_name="revised_place",
    hover_data={
        "count": True,
        "place_type": True,
        "revised_latitude": ":.4f",
        "revised_longitude": ":.4f",
    },
    size_max=25,                     # Same size scale as JMU
    zoom=2,                          # 📝 TO DO: Adjust for your region
    title="Cleaned Location Data: Places Mentioned in UNC Reddit Posts",  # 📝 TO DO: Update institution name
    center=dict(lat=38.5, lon=-106), # 📝 TO DO: Center on your region
    color_discrete_sequence=px.colors.qualitative.Plotly,  # Same colors as JMU
)

# STEP 3: Apply identical layout settings
fig_institution_cleaned.update_layout(
    map_style="carto-positron",      # Same style as JMU map
    width=800, 
    height=600, 
    title_font_size=16, 
    title_x=0.5
)



# Display with HTML export configuration
fig_institution_cleaned.show(config={'displayModeBar': True, 'displaylogo': False})

4.2 Spatial Analysis¶

✍️ Writing Task¶

Write a paragraph on an important spatial difference or similarity between the two datasets that confirms or complicates your hypothesis.

💡 Example: "While we theorized that UNC would have a significant number of posts about the Southeast, the mapping data does not reveal this. Instead, the Reddit feed rarely speaks about states outside of North Carolina, and when it does it is about institutions out west."

Part 5: Sentiment Analysis Comparison¶

In this section, you are going to compare the sentiments by location for each institution. You are going to do so by first customizing the create_location_sentiment() function.

🔧 Function Parameters¶

This takes the same parameters as the create_location_counts() function above:

  • minimum_count= - Sets the minimum number of times a location has to appear before it appears on the map. Setting this to the default 2 means that if a location appears once it will not register
  • place_type_filter= - This uses the place_type column to filter out only the types of places you want. The default is None, but adding a list of places i.e. (place_type_filter=["University", "Building"]) only shows those places.
    • 💡 Tip: You might consider showing only one type of place if it helps make your argument. For example, if you are investigating school spirit, it makes the most sense to look at Universities and buildings.
  • ⚠️ NOTE: place_type_filter only works if you tagged places properly in the cleanup process.

💡 Example Usage¶

create_location_sentiment(
    df_jmu,
    minimum_count=2,
    place_type_filter=None  # Include all place types
)

🔧 Customization Instructions¶

Visual Optimization:

  • Tweak the center and zoom of your map to highlight an important contrast
  • Experiment with different divergent color scales to optimize the visuals
    • Change: RdYlGN in color_continuous_scale="RdYlGn" to something of your choice
    • Color Reference: https://plotly.com/python/builtin-colorscales/#builtin-diverging-color-scales
  • Experiment with different map templates to optimize visuals:
    • Change: carto-positron in map_style="carto-positron"
    • Options include: 'basic', 'carto-darkmatter', 'carto-darkmatter-nolabels', 'carto-positron', 'carto-positron-nolabels', 'carto-voyager', 'carto-voyager-nolabels', 'dark', 'light', 'open-street-map', 'outdoors', 'satellite', 'satellite-streets', 'streets', 'white-bg'

⚠️ Note: Delete this cell for the final version

In [22]:
# =============================================================================
# JMU SENTIMENT ANALYSIS MAP
# =============================================================================
# Shows the EMOTIONAL tone of how JMU students talk about different places
# Red = negative emotions, Green = positive emotions

# STEP 1: Calculate average sentiment scores by location
# This function groups identical locations and averages their sentiment scores
df_jmu_sentiment = create_location_sentiment(
    df_jmu,                          # JMU Reddit data with sentiment scores
    minimum_count=2,                 # Only places mentioned 2+ times (for reliability)
    place_type_filter=None           # Include all place types for comprehensive view
)


# STEP 2: Create sentiment visualization map
# Color represents emotional tone: Green = positive, Red = negative, Yellow = neutral
fig_sentiment = px.scatter_map(
    df_jmu_sentiment,
    lat="revised_latitude",
    lon="revised_longitude",
    size="count",                    # Larger dots = more mentions (more reliable sentiment)
    color="avg_sentiment",           # Color intensity = emotional tone
    color_continuous_scale="RdYlGn", # Red-Yellow-Green scale (Red=negative, Green=positive)
    hover_name="revised_place",
    hover_data={
        "count": True,               # How many posts contributed to this sentiment
        "avg_sentiment": ":.3f",     # Average sentiment score (3 decimal places)
        "place_type": True,
        "revised_latitude": ":.4f",
        "revised_longitude": ":.4f",
    },
    size_max=25,
    zoom=2,                          # 📝 TO DO: Adjust to focus on interesting patterns
    title="Average Sentiment by Location in JMU Reddit Posts",
    center=dict(lat=38.5, lon=-86),  # 📝 TO DO: Center on region of interest
)

# STEP 3: Customize layout for sentiment analysis
fig_sentiment.update_layout(
    map_style="carto-positron",      # Clean background to highlight sentiment colors
    width=800, 
    height=600, 
    title_font_size=16, 
    title_x=0.5
)


# Display with HTML export configuration
fig_sentiment.show(config={'displayModeBar': True, 'displaylogo': False})
In [23]:
# =============================================================================
# YOUR INSTITUTION'S SENTIMENT ANALYSIS MAP
# =============================================================================
# Compare emotional patterns between your institution and JMU

# STEP 1: Calculate sentiment for your institution using identical methods
institution_sentiment = create_location_sentiment(
    df_institution_cleaned,          # Your cleaned institution data
    minimum_count=2,                 # Same minimum as JMU (ensures fair comparison)
    place_type_filter=None           # Same filter as JMU (include all place types)
)


# STEP 2: Create matching sentiment visualization
# Use identical settings to JMU map for direct comparison
fig_institution_sentiment = px.scatter_map(
    institution_sentiment,
    lat="revised_latitude",
    lon="revised_longitude",
    size="count",
    color="avg_sentiment",
    color_continuous_scale="RdYlGn", # Same color scale as JMU map
    hover_name="revised_place",
    hover_data={
        "count": True,
        "avg_sentiment": ":.3f",
        "place_type": True,
        "revised_latitude": ":.4f",
        "revised_longitude": ":.4f",
    },
    size_max=25,                     # Same size scale as JMU
    zoom=2,                          # 📝 TO DO: Adjust for your region
    title="Average Sentiment by Location in UNC Reddit Posts",  # 📝 TO DO: Update institution name
    center=dict(lat=35.5, lon=-80),  # 📝 TO DO: Center on your institution's region
)

# STEP 3: Apply identical layout for comparison
fig_institution_sentiment.update_layout(
    map_style="carto-positron",      # Same background as JMU map
    width=800, 
    height=600, 
    title_font_size=16, 
    title_x=0.5
)



# Display with HTML export configuration
fig_institution_sentiment.show(config={'displayModeBar': True, 'displaylogo': False})

Sentiment Comparison Analysis¶

✍️ Write You analyzed the place-based sentiments in both corpora. Reflect on how this confirms or complicates your hypothesis.

Part 6: Time Series Animation Analysis¶

Since the data has a time variable, we can also plot changes in place sentiment over time. This can give insight into whether a particular emotion is consistent or if there was a moment when emotions were better or worse about a location.

🔧 Customization Instructions¶

Data Filtering:

  • Adjust minimum_count and place_type to render a specific animation of the data you are interested in. (I.e. For "school spirit" I am only interested in the University and Buildings with a count over 4)
  • Adjust window_months for smoother transitions between emotions
    • This is a rolling average of emotions rather than emotional average per month

Visual Customization:

  • Change the center and zoom of the map to properly frame the animation at the right scale and location
  • Change color_continuous_scale="RdYlGn" for best visibility
  • Experiment with different map templates to optimize visuals:
    • Change: carto-positron in map_style="carto-positron"
  • Experiment with animation duration:
    fig_animated.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 800
    fig_animated.layout.updatemenus[0].buttons[0].args[1]["transition"]["duration"] = 300
    
    • Where 800 and 300 represent milliseconds
  • Experiment with size_max to get the optimal size for the bubbles
In [24]:
# =============================================================================
# ANIMATED TIME SERIES: SENTIMENT CHANGES OVER TIME
# =============================================================================
# Watch how places accumulate mentions and sentiment changes over time
# This reveals temporal patterns in student discussions

# STEP 1: Prepare animation data with rolling averages
# This function creates monthly frames showing cumulative growth and sentiment trends
institution_animation = create_time_animation_data(
    df_institution_cleaned,          # Your cleaned institution data
    window_months=3,                 # 3-month rolling average (smooths out noise)
    minimum_count=2,                 # Only places with 2+ total mentions
    place_type_filter=None           # Include all place types (📝 TO DO: experiment with filtering)
)

# STEP 2: Create animated scatter map
# Each frame represents one month, showing cumulative mentions and current sentiment
fig_animated = px.scatter_map(
    institution_animation,
    lat="revised_latitude",
    lon="revised_longitude",
    size="cumulative_count",         # Dot size = total mentions up to this point in time
    color="rolling_avg_sentiment",   # Color = 3-month average sentiment (smoother than daily)
    animation_frame="month",         # Each frame = one month of data
    animation_group="revised_place", # Keep same places connected across frames
    hover_name="revised_place",
    hover_data={
        "cumulative_count": True,    # Total mentions so far
        "rolling_avg_sentiment": ":.3f", # Smoothed sentiment score
        "place_type": True,
        "revised_latitude": ":.4f",
        "revised_longitude": ":.4f"
    },
    color_continuous_scale="RdYlGn", # Same sentiment colors as static maps
    size_max=30,                     # Slightly larger max size for animation visibility
    zoom=2,                          # 📝 TO DO: Adjust zoom for your region
    title="Institution Reddit Posts: Cumulative Location Mentions & Rolling Average Sentiment Over Time",
    center=dict(lat=35.5, lon=-80),  # 📝 TO DO: Center on your institution's area
    range_color=[-0.5, 0.5]          # Fixed color range for consistent comparison across time
)

# STEP 3: Customize animation settings and layout
fig_animated.update_layout(
    map_style="carto-positron",
    width=800,
    height=600,
    title_font_size=16,
    title_x=0.5,
    coloraxis_colorbar=dict(         # Customize the sentiment legend
        title="Rolling Avg<br>Sentiment",
        tickmode="linear",
        tick0=-0.5,                  # Start legend at -0.5 (most negative)
        dtick=0.25                   # Tick marks every 0.25 points
    )
)

# STEP 4: Set animation timing (in milliseconds)
# 📝 TO DO: Experiment with these values for optimal viewing
fig_animated.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 800    # Time between frames
fig_animated.layout.updatemenus[0].buttons[0].args[1]["transition"]["duration"] = 300 # Transition smoothness

# Display with HTML export configuration
fig_animated.show(config={'displayModeBar': True, 'displaylogo': False})

✍️ Time Series Analysis¶

Write Explain how you optimized the visuals for your the time series visualization and how this confirms or complicates your hypothesis.

Part 7: Conclusion and Future Research¶

✍️ Final Writing Task¶

Summarize all the research you did and establish whether you confirmed your hypothesis. Undoubtedly, your research will fall short. Explain how you would improve the data in the future to better answer your hypothesis.

📋 Required Elements¶

Your conclusion should address:

  • Hypothesis Confirmation: Did your findings support or contradict your original hypothesis?
  • Research Limitations: What shortcomings did you identify in your analysis?
  • Future Improvements: How would you enhance the data collection or analysis methods?
  • Broader Implications: What do your findings suggest about spatial sentiment analysis?